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1 Introduction 

The notion that evolution can be viewed as a hill-climbing process in an adap- 
tive landscape was introduced in 1932 by Sewall Wright [1], and remains one 
of the most powerful images in evolutionary biology [2]. Since the discovery 
of the molecular structure of genes it has been clear that the substrate over 
which the adaptive landscape should be properly defined is the space of ge- 
netic sequences [3]. Nevertheless, apart from a few landmark papers [4, 5], 
adaptation has not been in the focus of the theory of molecular evolution, 
which instead has concentrated on the effects of stochastic drift in a neu- 
tral {flat) fitness landscape [6]. This situation is presently changing [7, 8]. 
Long-term evolution experiments on microbial populations [9] are beginning 
to produce a wealth of data, on the phenotypic as well as on the genotypic 
level, which make it meaningful to ask precise questions about the timing and 
size of adaptive events, and what they can tell us about the structure of the 
underlying adaptive landscape. 

In this chapter we introduce a class of sequence-based models of adapta- 
tion, which have been the subject of much recent interest in theoretical pop- 
ulation genetics as well as in biologically inspired statistical physics. These 
models describe the behavior of a population of haploid, asexual individu- 
als, each characterized by a genetic sequence of fixed length, in an adaptive 
landscape which assigns a fitness value to each genotype. The population is 
exposed to the competing influences of mutations, which tend to increase the 
genetic variability, and selection, which focuses the population in regions of 
high fitness. The dynamics is deterministic, which implies that the genetic drift 
induced by the stochastic sampling noise in finite populations is neglected, and 
the adaptive landscape is generally taken to be time- independent. In view of 
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the vastness of the field, the selection of topics is unavoidably biased by the 
interests and preferences of the authors. For a more comprehensive coverage 
we refer the reader to several recent review articles [10, 11, 12, 13, 14]. 

The chapter is organised as follows. In the next section the key concepts 
and their mathematical representation arc introduced, and several types of 
mutation-selection dynamics are described, leaving the form of the adaptive 
landscape unspecified. In Sect. 3 we consider simple fitness landscape compris- 
ing a single adaptive peak or possibly two competing peaks. Here the central 
theme is the error threshold phenomenon, which refers to the sudden delocal- 
ization of the population from the fitness peak as the mutation rate increases 
beyond a critical value. As is described in this book in the chapter by Ester 
Lazaro, the error threshold and the related concept of a quasispecies play an 
important role in the population dynamics of RNA viruses and in the devel- 
opment of antiviral strategies. Due to its similarity to a phase transition, the 
error threshold has been thoroughly analyzed using a range of methods from 
statistical physics. We give an elementary derivation of the critical mutation 
rate, and describe several modifications of the basic model, including fitness 
peaks with a variable amount of epistasis, diploid populations, semiconserva- 
tive replication, and time-dependent landscapes. 

Section 4 is devoted to complex fitness landscapes consisting of many peaks 
and valleys. Such landscapes can be modeled by ensembles of random func- 
tions, which links this subject to the statistical physics of disordered systems. 
Whereas so far the discussion has been restricted to static or steady state 
properties, time-dependent aspects of mutation-selection dynamics are dis- 
cussed in Sect. 5. Finally, experimental realizations {in vitro as well as in 
vivo) of the models are described in Sect. 6, and some concluding remarks are 
presented in Sect. 7. 

2 Basic concepts and models 

In the following discussion, the constituents of a population carry a string 
a = {(Ti, fTjv} where each of the N letters di is taken from an alphabet 
of size £ > 2. In classical population genetics, a represents the configuration 
of alleles (variants of a gene) ai located at gene loci i. Typically, one-locus, 
i allele models where £ can take values between two (wild type and mutant) 
to infinity (continuum of alleles) have been considered [15]. In the language 
of population genetics, we are here concerned with multilocus models with 
complete linkage [11]. 

At the molecular level, a represents the genetic sequence of an individual. 
For DNA(RNA) based organisms, £ = 4 corresponding to the nucleotide bases 
A, T(U), C and G and the sequence length N varies from a few thousands 
for viruses to about 10^ for humans. Thus, the total number 4^ of sequences 
available is hyperastronomically large. The minimum value of i = 2 can be 
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obtained by lumping A and G together in purins and C, T and U in pyrim- 

idines. The sequences may also represent proteins composed of a few hundred 
amino acids taken from an alphabet of size i = 20 [3]. 

2.1 Fitness, mutations, and sequence space 

The essence of natural selection is that the relative reproductive success of an 

individual determines whether the corresponding genotype becomes more or 
less abundant in the population. The fitness of an individual is a quantitative 
measure of its reproductive success; depending on the context, it may be 
defined as the viability of an organism, i.e. the probability to survive to the 
age of reproduction [2] , the replication rate of a microbe, the binding affinity 
of regulatory proteins to DNA [16] or of antibodies produced by B-cells to 
pathogens [17], the program execution speed for digital organisms [18], or the 
cost function in an optimization problem [19]. 

In principle, one should assign fitness to the phenotype which then should 
be related to the genotype; unfortunately, the genotype-phenotype map 
is complicated and largely unknown except for a few cases [20] (see also 
Sect. 4.1). This problem is usually outflanked by associating fitness W{a) 
with the genotype itself and define it to be the expected number of offspring 
produced by an individual with sequence a [14]. This definition applies to the 
case of discrete generations, and is known as Wrightian fitness. To pass to 
continuous time dynamics we write 

W{a) = exp[w{a)Ai\ w 1 + w{a)At, At 0, (1) 

where At is the generation time and w^cr) is referred to as the Malthusian 
fitness [10]. For future reference we note that multiplication of the Wrightian 
fitnesses by a common factor implies a constant additive shift of the Malthu- 
sian fitnesses. 



Organism Genome size Rate per base Rate per genome 



Bacteriophage Q/3 


4.5 xlO^ 


1.4 xlO" 


-3 


6.5 


Vesicular Stomatitis virus 








3.5 


Bacteriophage A 


4.9 xlO* 


7.7 XlO" 


-8 


0.0038 


E. Coli 


4.6 xlO*^ 


5.4 XlO" 


■10 


0.0025 


C. Elegans 


8.0 xlO^ 


2.3 XlO" 


10 


0.018 


Mouse 


2.7 xlO-' 


1.8 XlO" 


-10 


0.49 


Human 


3.2 xlO^ 


5.0 XlO" 


■ 11 


0.16 



Table 1. Spontaneous mutation rates for various organisms taken from [21]. The 
first two organisms have RNA as genetic material and the rest are DNA based. 



In the next subsection, wc will discuss models in which mutations occur 
either as copying errors in the genetic material during cell division or induced 
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by some external influences. In Table 2.1, the spontaneous mutation rates 

for some organisms arc shown. They difi^cr by orders of magnitude between 
RNA-based viruses whose mutation rate per genome exceeds unity, and DNA- 
based organisms, which can afford the complex replication machinery needed 
to reduce the mutation rate to a much lower level. It has been suggested 
that the high mutation rate of RNA viruses, rather than being due to the 
lack of correction mechanisms, may constitute an adaptation to the rapidly 
fluctuating environments that these organisms encounter (sec the chapter by 
E. Lazaro). Within the DNA organisms, the mutation rate per base is seen to 
decrease with increasing sequence length, and the mutation rate per genome is 
roughly constant for similar organisms. However, mutation rates per genome 
in higher eukaryotes become comparable to those of DNA-based microbes if 
referred to the effective genome size, which excludes non-coding regions [21]. 

Before wc describe the mutation-selection models, we need to specify the 
space of sequences on which the evolutionary dynamics operate. The structure 
and geometry of the sequence space depends on the nature of the allowed 
moves that change one sequence into another. In the simplest case of a genome 
of fixed length N subject only to point mutations (which we will restrict 
ourselves to throughout this chapter), the natural choice for the sequence 
space is the Hamming space with £^ points. Two sequences cr and a' are 
separated by the Hamming distance d{a,a') which is given by 



The Hamming distance simply counts the number of letters in which the two 
sequences differ, that is, the number of point mutations needed to mutate a 
into a' (and viceversa). The Hamming space for A'' = 3 and ^ = 2 is shown 
in Fig. 1 (left). The sequences are located at the corners of a cube, which for 
general N becomes the A^-dimensional hypercube. 

To give an example of a sequence space with a somewhat different ge- 
ometry, wc consider the Graph Bipartitioning Problem (GBP) [22] (see also 
Sect. 4.3). In the GBP, as the name suggests, the problem is to partition a 
graph with given connections into two sets A and B with equal number of 
vertices, such that the number of connections between A and B is minimised. 
A bipartitioning configuration is mapped onto a binary sequence by setting 
cTj = 1 if the vertex i belongs to set A, and Ui = — 1 else. Thus the sequence 
space consists of those (^2) configurations a for which (7^ = 0, a sub- 
set of the Hamming space. An elementary move exchanges a pair of vertices 
between the sets A and B. Two configurations are said to be at a distance 
c^GBP (c, cr') = d if they can be related by d exchange moves, so that rfcBP 
is half of the Hamming distance defined above. The GBP sequence space for 
A'' = 4 is shown in Fig. 1 (right). 

The Hamming space as well as the GBP sequence space arc symmetric and 
regular graphs, in the sense that each vertex has the same number of neighbors 



N 




(2) 




Fig. 1. Examples of sequence spaces. Left panel: Hamming space of binary sequences 
of length = 3. Right panel: Graph bipartitioning problem space for = 4. In 
both cases a, = ±1 and nearest neighbors are connected by lines. 



and all vertices are equivalent. This is no longer true if mutations that change 
the sequence length through deletions, insertions or gene duplications are 
taken into account. Genetic recombination, which is of crucial importance for 
sexual reproduction, leads to additional complications, because it introduces 
moves which involve pairs of sequences [19]. 

We return to the case of point mutations acting on sequences of fixed 
length N, and proceed to derive an expression for the mutation probabilities 
taking one sequence to another. If the mutations change a letter (7j to any one 
of the other £ — 1 values with a probability /x, independent of the identity of 
the letter and the other letters in the sequence, then the probability to mutate 
a sequence a' to a can be written as 

'>('''-<')-((?Tl^)'"""(l-rt" (3) 

Obviously, this probability is the same for all sequences which are at a 
constant Hamming distance d from sequence a, where is given by 

a,= Q(^-l)'^. (4) 

This can be seen by noting that there are (^) ways of choosing d letters at 
which a sequence differs from a and each of these d letters can take £ — 1 
values. For large N, most of the sequences are located in a belt of width 
~ Vn around the distance dmax = N{£ — away from a. Using (4), it is 
easily checked that ^„p{(y' ^ cr) = 1. 

Similar to the transition from Wrightian to Malthusian fitness, in the con- 
tinuous time limit the mutation probability (3) has to be replaced by the 
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mutation rate 7(0-' — > a), such that for generation time At ^ 



(5) 



Denoting the mutation rate per letter by fi and setting fi = jl At in (3) yields 



The normalization condition for mutation rates reads 7(c' — » a) = 0. 
2.2 Mutation-selection models 

We now discuss models of adaptation that incorporate the two competing 
processes discussed above, namely, mutation and selection. While mutation 
increases genetic diversity, selection tends to contain the population at fit se- 
quences. In case selection wins out, one obtains a population in which individ- 
uals arc genetically closely related else a heterogeneous population distributed 
over the entire sequence space results. In this article, we will mainly discuss 
the so-called coupled models in which the mutations occur only during repli- 
cation. In the paramuse models, on the other hand, mutation and selection 
occurs in parallel, and they will be discussed here briefly. We refer the reader 
for more details to the reviews [10, 11] and references therein. While one may 
expect both types of mutation mechanisms to be relevant in describing evo- 
lution, the jury is still out on their relative importance. For this reason, both 
classes of models have been analysed in detail and the relationship between 
them has been explored, with regard to both static [23] and dynamic [24] 
properties. 

The models discussed below work under the following two assumptions: 

(i) Infinite population, i.e., the total population size M ^ £^ , the total 
number of genotypes available. Under this assumption a deterministic de- 
scription suffices and we can write down the time evolution equation for 
the average population fraction X{a,t) of sequence cr at time t. Although 
this is often unrealistic, the analysis is simpler in this limit which in many 
cases can be adapted to the finite population case to provide quantitative 
agreement with experiments [25, 26, 27]. The infinite population limit can 
be justified if the population is known to be localized in a small region of 
sequence space around a fitness peak, if one is interested in a short piece of 
the genome such as a single regulatory binding site [16] (see also Sect. 3.4) 
or if one works in the population genetics setting, where the letters in the 
sequence are alleles of a gene, rather than single nucleotides. 

(ii) Asexual reproduction which dominates in the lower forms of life such as 
virus and bacteria, and digital organisms. We will mainly consider haploid 
organisms but diploids are briefly discussed in Section 3.5. However, we do 




d{<j',a) > 1 
d{a',a) = 1 
d{(7',a) = 



(6) 
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not consider the case of sexual reproduction; a comparison between sexual 
and asexual reproduction modes in the context of sequence space models 
can be found in [28]. 



ParEimuse models 

In the paramuse models, introduced by Crow and Kimura [6], one assumes 
error-free replication and mutations are induced by the environment through 
radiation, thermal fluctuations etc. [10]. The equation for the rate of change 
X(fj, t) = dX{a, t) I dt of the fraction X{a, t) of the population with sequence 
(7 is given by 

X{a, t) = [w{a) - J2 w{(T')X{a', t)]X{a, t) + J2 ^ , t). (7) 

<t' a' 

The first term is the selection term while the contribution from the mutations 
is contained in the last term. The evolution equation (7) is nonlinear in X(a, t) 
due to the second term on the right hand side, which is required to ensure the 
normalisation ^(c^) = 1- This nonlinearity can be eliminated by passing 
to unnormalised population variables Z{a, t) defined by 



Z{a,t) = X{a,t) exp 



fdTX{a',T) 
Jo 



(8) 



which satisfy the linear equation [29] 

^^gl^ = w{a)Z{a, t)+Y^ - <^)Z{cy', t). (9) 
Equation (7) follows from (9) using the relation 

Inserting the explicit form (6) for the mutation rates, it can be shown that 
the vector Z{t) = {Z{a'^,t), Z(ct'5, t)), where the index labels the S = 
points in sequence space, obeys a Schrodinger equation in imaginary time 

^ = «Z(0 (11, 

with quantum spin Hamiltonian H in one dimension. Specifically, for £ = 2, 
one obtains the Hamiltonian of an Ising chain in the presence of a transverse 
magnetic field (mutations) with general interactions (specified by the fitness 
landscape) [29]; for an explicit example see Eq. (40). This model has been 
solved exactly for a variety of fitness landscapes using methods of quantum 
statistical physics [24, 29, 30]. A similar analysis has also been carried out for 
the biologically relevant case of £ = 4 [31]. 
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Coupled (quasispecies) dynamics 

In the quasispecies model introduced by Eigen in the context of prebiotic 
evolution [4, 32, 33], the mutations arc copying errors that occur during the 
reproduction process. This implies that the population fraction X{a, t) evolves 
according to 

X{a, t) = Y^p{a' ^ a)W{a')Xia', t) - Wia')Xia', t)^ t) (12) 

which can be linearised by a transformation analogous to (8) to yield the 
linear equation 

Z{a,t) = Y,pW ^ cT)W{a')Z{a',t). (13) 

a' 

In discrete time this model takes the form 

where the denominator arises due to the normalisation. The discrete time 
analog of the transformation (8) is given by 

t-i 

Z{a, t) = X(a, t) n E W{cj')X{a', r) (15) 

T=0 cr' 

As before, the unnormalised variables obey a linear equation given by 

Z{a, t+l)= a)W{cr')Z{cT', t). (16) 

a' 

The use of the Wrightian (discrete time) fitness W{(j) in the continuous 
time equation (12) requires some explanation. First, it ensures that the sta- 
tionary solutions of (12) and (14) are identical. Second, it reflects the fact 
that (12) is invariant (up to a rescaling of time) under multiplication of the 
fitnesses by a constant factor, W{a) CW{a), which is an exact symmetry 
of the discrete time equation (14), whereas the continuous time paramuse dy- 
namics (7) is invariant under additive shifts 'w{a) — > w{(j) + C [10, 23]. In fact, 
(12) is not the continuous time limit of (14). Instead, inserting (1) and (5) in 
(14) and taking At 0, one obtains the paramuse dynamics (7). In this sense 
(12) is somewhat intermediate between the discrete time model (14) and the 
continuous time dynamics (7). 

For the discrete time model (14) one can represent the evolutionary histo- 
ries as configurations on a two-dimensional lattice with the two axes directed 
along the scqiience and along time, with a spin variable (7i{t) at each site. 
Writing the evolution equation (16) for the vector Z(i) in the form 



z{t + 1) = rt+i,t z{t) 



(17) 
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then suggests to interpret Tt+i,t as the transfer matrix of a two-dimensional 
classical spin model which relates the probability of a configuration in one 
row of the lattice to the next one [34]. For i = 2, this 2^ x 2^ matrix can be 
written (up to a multiplicative constant) as 

JV 

Tt+i,t[Wi{t + 1)}, {ai{t)}] = exp[ln W{{ai{t)}) + J^ai(t + l)ai{t)] (18) 

i=l 

where 

J=hn{„-'-l). (19) 

Thus Tt+i,t = exp[— 77], where H is the Hamiltonian of a two-dimensional 
Ising modcl^ with nearest neighbor interactions of strength J along the time 
direction and general interactions [determined by the fitness landscape M^(c7)] 
along the sequence direction [35]. The expression (19) shows that the interac- 
tions along the time direction arc ferromagnetic (antifcrromagnetic) whenever 
fi < 1/2 (/X > 1/2), while for /z = 1/2 the sequence is completely randomized 
in each time step and the interaction vanishes. 

Clearly, to obtain the distribution of sequences at time slice t, one needs 
to solve iteratively for all the t — 1 preceding layers. In the steady state for 
which t ^ 00 one requires the properties of the last "surface" layer coupled 
to a semi-infinite "bulk". Since the transfer matrix (18) does not contain any 
couplings along the sequence direction in the last layer t + 1, the boundary 
condition for this semi- infinite spin model corresponds to a free surface [36] . 

3 Simple fitness landscapes 

So far we have discussed the general equations governing the evolution of a 
population with mutations, but the fitness landscape was not specified. We do 
so now and begin with landscapes that are "simple" in that the fitness depends 
only on the distance from a given (master) sequence, which is usually the 
genotype of highest fitness^. Such landscapes are called permutation invariant, 
because the fitness depends only on the number of mismatches relative to 
the master sequence, but not on their position. Using this symmetry, the 
population variables can be grouped into A'' -|- 1 error classes, which greatly 
facilitates both numerical and analytic work [37]. 

3.1 The error threshold: PreUminary considerations 

Much of this section is devoted to a discussion of the error threshold phe- 
nomenon, which refers to the loss of genetic integrity when mutations are 

The Ising Hamiltonian H should not be confused with the Hamiltonian H of the 

quantum spin chain in (11). 

In the context of population genetics, the master genotype is often referred to as 
the wild type. 
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increased beyond a certain threshold. We consider only the stationary pop- 
ulation distribution which is established after a long time. The linearity of 
both the continuous and discrete time evolution equations (9, 14, 16) implies 
that the stationary distribution is identical to the principal eigenvector of 
the matrix multiplying the population vector on the right hand side, i.e., the 
eigenvector with the largest eigenvalue. The principal eigenvalue is related to 
the mean population fitness in the stationary state. In this sense, the analysis 
of different fitness landscapes and mutation schemes is reduced to the inves- 
tigation of the spectral properties of the corresponding evolution matrices 
[38]. 

The error threshold separates two regimes of mutation-selection balance 
characterized by a qualitatively different structure of the principal eigenvector. 
For small mutation rates the eigenvector is localized around the master se- 
quence, i.e. only the entries corresponding to the dominant genotype and a few 
of its nearby mutants carry appreciable weight. Following Eigen and Schuster 
[32], such a localized population distribution is referred to as a quasispecies. 
When the mutation rate is increased beyond the error threshold, the prin- 
cipal eigenvector becomes delocalized and the population spreads uniformly 
throughout the sequence space. In this regime finite population eff'ects, which 
are neglected in the models considered here, become extremely important: 
Rather than covering the entire sequence space, which is impossible given the 
vast number of sequences, a finite population forms a localized cloud which 
wanders about randomly [39]. 

Since the eigenvectors and eigenvalues of any finite matrix depend smoothly 
on its entries, the error threshold can become sharp, in the sense of being as- 
sociated with some non-analytic behavior of the population distribution or 
the mean population fitness, only in the limit N oo. We shall see below 
that in order to maintain the localized quasispecies in this limit, it is usually 
necessary to either reduce the single site mutation probability /i, such that 
the mutation probability per genome /xTV remains constant, or to increase the 
selective advantage of the master sequence with increasing N. 

3.2 Error threshold in the sharp peeik landscape 

We demonstrate the error threshold in the case of a single sharp peak land- 
scape which is defined as 

W{a) = Wod,,^, + (1 - S,,,,), Wo > 1. (20) 

Here erg denotes the master sequence, and Wq is the selective advantage of 
the master sequence relative to the other sequences, whose Wrightian fitness 
has been normalized to unity. We anticipate the error threshold to occur for 

— > 0, N oo, keeping the mutation rate per genome fiN finite. Let us 
consider the coupled model in discrete time'* defined by (14) with the choice 

* Recall that in the steady state, both versions of the coupled model are identical. 
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(20). In the limit — > the mutations taking the mutants back into the master 

sequence can be neglected^, and the only nonzero contribution to X{ao) on 
the right hand side of (14) is that for a' = a. This yields 

which is an acceptable solution provided iJ,N < In Wq. Thus, a phase transition 
occurs at the critical mutation probability 

/^c = ^ (22) 

beyond which the population cannot be maintained at the peak of the land- 
scape. Close to /ic, the fraction of population at the master sequence behaves 
as 

N 

^('"o^-^^JT (23) 

Wo ~ i 

thus approaching zero continuously at /Xc- The above results are also confirmed 
by a detailed numerical analysis for finite /x and N, in which the population 
was grouped into error classes at constant Hamming distance from the master 
sequence and the population in the error classes as well as the eigenvalues of 
the evolution matrix were followed as a function of fj, [37]. 




Fig. 2. Figure to show the continuous transition in the fraction X{a{]) of the master 
sequence and the (almost) discontinuous one in the overlap m as a function of 
mutation rate n, for N = 1000 and Wo = 4. 



® Neglecting back mutations towards the master sequence is common in population 
genetics, where it is referred to as a unidirectional mutation scheme [11]. It sim- 
plifies the analytic treatment [28, 40], and will be used repeatedly in this article 
as an approximation which is expected to become exact for fj, ^ 0, N —> oo. 
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The way in which the error threshold condition (22) combines mutation 
rate, sequence length and selective advantage is the central result of quasis- 
pecies theory. In particular, it shows that in order to maintain a localized qua- 
sispecies at finite single site mutation rate in the limit N ^ oo, the selective 
advantage has to increase exponentially with N [41]. Under the assumption 
that typical selective advantages do not depend strongly on sequence length, 
Eq. (22) also provides some rationalization for the observation that the prod- 
uct fiN is roughly constant within classes of similar organisms (see Sect. 2.1). 
On the other hand, at given achievable values of the replication accuracy and 
the selective advantage, the condition ^ < Hc place an upper bound A^max 
on the sequence length, beyond which genetic integrity is lost. Elsewhere in 
this book Ester Lazaro presents substantial evidence that RNA viruses have 
evolved to reside close to this threshold, possibly because this allows them to 
maintain a maximal genetic variability which is needed to rapidly adapt to 
changing environments (see also Sect. 6.2). 

Neglecting back mutations to the master sequence allows to derive an 
expression for the mean Hamming distance to the master sequence, which 
reads [40] 

^^^''^ = vZ--!^-i - 

The mean Hamming distance is finite for /i < jic and diverges as (/ic — 
as the error threshold is approached. This provides an alternative character- 
isation of the threshold. A related quantity, which has been proposed as an 
order parameter for the transition, is the mean overlap 

m=l-?i%^ (25) 

between the master sequence and a randomly chosen sequence [41]. Since 
(d(cr, (To)) remains finite for A'' ^ oo in the locahsed phase, the overlap is 
m = 1 in this limit and jumps discontinuously to m = at the threshold. Fig- 
ure 2 displaying the two order parameters considered in the above discussion 
illustrates that the nature of the transition - continuous or discontinuous - 
depends to some extent on the quantity under consideration^. 

Yet another characterization of the error threshold relies on the notion of 
the consensus sequence a'^, which carries at each site i that letter erf which 
is most frequently represented in the population. It is easy to see that, for 
symmetry reasons, the consensus sequence in the sharp peak landscape (20) 
coincides with the master sequence, a'^ = ao, throughout the localized phase; 
this is true for general permutation-invariant single peak landscapes. In the 

^ In contradiction to the discussion above, a numerical study based on the mapping 
to a two-dimensional Ising model described in Sect. 2.2 deduced that both m and 
X{ao) change smoothly at the transition [36]. However in this study, a scaling 
analysis with genome length (akin to finite size scaling analysis in statistical 
mechanics) was not carried out to obtain the behavior in the limit iV — » oo. 
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delocalized phase, where the population is uniformly spread throughout se- 
quence space for ^ oo, all letters appear with equal probability and the 
consensus sequence cannot be defined. This is an artifact of the assumption 
of infinite population size: a finite population retains some genetic structure 
even in a flat fltness landscape and diffuses through sequence space as a cloud 
centered around a moving consensus sequence a'^{t) [39]. Thus at the error 
threshold the consensus sequence ceases to be pinned to the master sequence 
and becomes time-dependent. This criterion to locate the transition is particu- 
larly useful in complex fitness landscapes, where the most-fit master sequence 
is not known [42] (see Sect. 4). Similarly, in experimental studies of microbial 
populations such as RNA viruses, the consensus sequence is taken to repre- 
sent the (unknown) wildtype genome, and the genetic spread of the population 
around ct'^ is interpreted as a measure of the balance between mutational and 
selective forces (see the chapter by E. Lazaro). 

3.3 Exact solution of a sharp peak model 

A variant of Eigen's model was solved exactly for any in [43]. The model 
is defined in discrete time but the mutations are restricted to mutants within 
Hamming distance equal to one, as for the continuous time mutation rates (6). 
In addition, mutations are assumed to occur in the whole population before 
the reproduction process. With the fitness landscape (20) this leads to the 
linear evolution equation 



for the unnormalised population variables. Note that the model is well defined 
only for < 1. 

At large times, Z{a, t + 1) fa AZ{a, t) where A is the largest eigenvalue of 
the evolution matrix on the right hand side of (27). In the delocalised phase, 
the population is spread over the entire sequence space with mean fitness 
W = 1, so that yl = 1 whereas in the localised phase, a finite fraction has 
fitness Wo > 1 and hence yl > 1. For any N, the eigenvalue A is determined 
by the exact equation 



Due to the k = term on the RHS of the above equation, it is evident that A 
can take a value equal to 1 only in the N ^ oo limit. Thus, there is no phase 
transition for any finite N. 

In the limit iV — > oo, /i — > with N/j. < 1 fixed the eigenvalue is given by 
the expression 



Z{a,t + !) = [! + [Wo -l)S,,„,]x 



(26) 



X (1 - Nfi)Z{a, t) + M ^ Zia', t) 




(27) 
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yl = max{l,Wo(l--/VM)}, (28) 
which sticks to unity beyond the critical mutation strength 

Mc = ^. (29) 
WqN ^ ' 

Incidentally, the above expression for /Xc can be obtained using (21) by ex- 
panding the exponential to first order in /iiV. This is required to ensure that 
jjiN < 1 is satisfied for any Wq > 1. In both cases the selective advantage 
needed to localize the quasispecies is the inverse of the copying fidelity, i.e. 
the probability of creating an error-free offspring. 

The behavior of other quantities at the threshold follow from that of A. 
For example, the fraction of the population residing at the master sequence 
is given by 

_ Wo{A-l) 
- {W, - 1)A 

which vanishes linearly in /Xc — M at the threshold, and the mean Hamming 
distance from the master sequence is 

(d(a,ao)) = ^ (31) 

which diverges as (/i^ — m)~^. The expressions (29 - 31) are valid in the asymp- 
totic limit ^ oo, but systematic expansions of these quantities in powers 
of 1/A^ are also available [43]. 

Comparing the expressions (23) and (24) to (30) and (31) respectively, we 
see that X(c7o) and {d(a, do)) behave qualitatively similar in the two models as 
the error threshold is approached. This is a simple example of the principle of 
universality commonly encountered at physical phase transitions, which states 
that the way in which singular quantities vanish or diverge at the transition 
is independent of detailed properties of the model. 



3.4 Modifying the shape of the fitness peak 

Since the sharp peak landscape (20) was chosen for its simplicity, and not 
because it is expected to be biologically realistic, it is important to investigate 
how the error threshold phenomenology depends on the shape of the fitness 
peak. In this section we discuss some illustrative examples. A method for 
solving the stationary quasispecies equation for general peak shapes has been 
developed by Peliti [44]. It employs a strong selection limit, in which the fitness 
is written as W{a) = cxp[N^{a)] and the limit A'' — > oo is carried out at fixed 
mutation probability fi. 
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Peak height versus peak width 

We first consider a landscape with one sharp global maximum and a broad 
peak of lower fitness separated by a flat landscape. This is defined as 

(32) 

j7iO,N-l,N 

where is the sequence at maximal Hamming distance N from ctq and 
Wq > Wn > Wn-1 > 1- By placing the two fitness peaks at the two poles ctq 
and of the sequence space, the permutation symmetry of the landscape 
is preserved and the population can be subdivided into error classes. The 
coupled model with the landscape (32) has been studied in both continuous 
[45] and discrete time [36]. Interestingly, with increasing mutation rate, the 
quasispecies shifts abruptly from the sequence ctq to the broader peak around 
finally delocalising over the whole sequence space. For large mutation 
rates, the quasispecies is more comfortable at the lower peak surrounded by 
an extended region of elevated fitness than at the (globally optimal) isolated 
master sequence. 



Mesa landscapes 

Broad fitness peaks arise naturally in the evolution of regulatory binding sites 
[16, 46, 47]. In this context the fitness of a given regulatory sequence can be 
plausibly related to the binding probability of the corresponding transcription 
factor. Simple thermodynamic models predict that the binding probability 
depends on the number of mismatches d{a, ctq) with respect to the regulatory 
master sequence ao through a Fermi function, 

^"^^^ = l + exp[e(rf-rfo)ABT]' ^^^^ 

where e is the binding energy per mismatch, edo is the chemical potential 
corresponding to the concentration of the transcription factor, and ksT is the 
thermal energy at temperature T. For e/k^T » 1 the binding probability 
drops abruptly from = 1 to = {] when d exceeds the number do of 
tolerable mismatches; a typical value of this ratio is e/k^T ~ 2. 

In the simplest scenario, the selective advantage of a regulatory sequence 
is assumed to be proportional to the binding probability. This leads to a 
mesa-shaped fitness landscape, with a plateau of constant fitness and radius 
do around the master sequence. In [16] a detailed study of the error threshold 
in this landscape was presented for continuous time paramuse dynamics with 
fitness landscape w{d) = WoPb{d). An exact solution is possible in a limit where 
d becomes a continuous variable and the Fermi function (33) is replaced by a 
step function. Provided do <C N, the error threshold is found to take place at 
a critical mutation strength /Zg given by 
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_ 2wo 

N{i + nVdiy ^^^^ 

where is a constant of order unity. The critical mutation strength is seen to 
increase with increasing do, illustrating the enhanced stability of the quasis- 
pecies with increasing width of the fitness peak. In the localized phase, the 
majority of the population is located near the mesa edge at d = do, reflecting 
the exponential increase of the number (4) of available genotypes with dis- 
tance d. This is a purely entropic effect, which leads to a maximal fuzziness 
of regulatory motifs. 

Somewhat more realistically, one expects that the fitness depends not only 
on the ability of the sequence to bind the transcription factor in a certain 
cellular state, but also on its ability to avoid binding in other states. This 
can be modeled by a fitness function which is proportional to the difference 
between two Fermi functions (33) with different values of do, leading to a 
crater landscape with a rim of high fitness around a fitness minimum at d = 
[47]. 



Epistasis: Coupled dynamics 

Not all landscapes display the error threshold phenomenon. We illustrate this 
point using the multiplicative (or Fujiyama) landscape as an example. In this 
case 

N 

W{a) = Y[ e^"' = exp[A(iV - 2d(ao, a))], (35) 

i=l 

where for simplicity we choose i = 2 and let <Ji take values ±1. For A > the 
master sequence is uq = (1,1,1...,1) and the Hamiltonian H obtained from 
(18) is 

N 

H = J2[-Jai{t + l)ai{t) - Xai{t)]. (36) 

Due to the absence of interactions along the sequence space direction, one 
obtains, for each position i, a one-dimensional Ising model in the presence of 
magnetic field A. This model is well known to lack a phase transition and due 
to the A term, the spins tend to align in the direction of the field. Correspond- 
ingly, a finite fraction of the population is maintained at the master sequence 
for any value of the mutation rate. The full population distribution has been 
worked out in [48] . 

In genetic terms, the multiplicative form (35) implies that the different 
gene loci contribute independently to the fitness, which is referred to as the 
absence of epistatic interactions. In general, one must distinguish between 
synergistic or negative epistasis, in which the (deleterious) effect of an addi- 
tional mutation increases with increasing distance from the wild type (master 
sequence), and diminishing returns or positive epistasis, when the effects of 
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mutations decreases with increasing distance^. The sharp peak landscape (20) 
is an extreme case of positive cpistasis, because after the first mutation away 
from the master sequence, any additional mutation does not affect the fit- 
ness at all. An extreme limit of negative epistasis is represented by the case 
of truncation selection, where the Wrightian fitness vanishes beyond a criti- 
cal Hamming distance dc [49]. As we discuss below, whether or not an error 
threshold occurs depends on the behavior of the landscape at large Hamming 
distance from the master sequence. 

Consider a general fitness landscape defined by [40] 

W{C7) = q{l - s)'^('^''^o)° + 1 _ ^ (37) 

where < g, s < 1 and a > 0. Two cases need to be distinguished: for q < I, 
the lower bound on the fitness is nonzero and when s — » 1 it becomes of sharp 
peak type (20) with the (relative) selective advantage 1/(1 — g) for the master 
sequence, while for g = 1, the multiplicative form (35) with A = — ln(l — s) 
is recovered for a = 1, and a > 1 (a < 1) describes a situation with negative 
(positive) epistasis (Fig. 3). 

The error threshold can be computed in the unidirectional approximation 
(no back mutations towards the master sequence), and in the limit -/V — > oo, 
/i ^ 0, for a = 1, g < 1, it has been shown that the critical mutation strength 
/Xc = -^"^ lii[l/(l ~ ?)]> which is of exactly the same form as the sharp peak 
result (22). For g = 1, a similar analysis shows that (37) displays an error 
threshold only when a < 1, with a critical mutation strength given by /Uc = 
N"~^\ [40]. Note that in this case, the correct scaling is obtained in the limit 
N ^ oo, /i ^ keeping ^N^~°' fixed. 

The above results can be understood using the following result for general 
bounded Wrightian fitness landscapes with < Wmin < W{a) < Wmax < oo. 
For such landscapes, the master sequence is lost from the population at a crit- 
ical mutation probability which satisfies (in the unidirectional approximation 
and for iV ^ oo) [40] 

Mc < ^ln(Vy„,ax/t^mi„); (38) 

a similar result is proved in [50]. If the right hand side of (38) diverges as 
N ^ QO, this would imply that there is no finite error threshold and the 
master seqiience is maintained at any mutation rate while its vanishing would 
be consistent with the existence of a sharp transition for /x ^ 0, ^ oo. For 
g = 1, the ratio between the largest and the smallest fitness is H^max/W^min = 
e'*'^ , so that the right hand side of (38) vanishes for N ^ oo only when a < 1 
whereas it goes to zero for any a > for g < 1, in agreement with the results 
cited above. The case of the multiplicative landscape (35) is special; here 
Wmax/M^min = 6^"^^ and (38) would suggest a finite error threshold^. However, 

This nomenclature is based on [11, 40], but it does not appear to be unambiguous; 
in [49] a definition of positive and negative epistasis is used which is opposite to 
the present one. 

* The unidirectional approximation erroneously predicts a transition at Hc = s [40] . 
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as discussed earlier, the master sequence is maintained at any mutation rate 
for a = g = 1. 



(Positive) 



a > 1 \^ "---..,_(Null) 
(Negative) 



Fig. 3. Illustration of the fitness landscape (37) with s = q = 0.5 and three different 
values of a. 



The general conclusion from these considerations is that the existence of 
an error threshold requires positive epistasis. This can be understand from 
the following qualitative argument [40] : For the case of positive epistasis, the 
selection force towards the fitness peak that has to be overcome by mutations 
is largest close to the peak; once this initial barrier has been surpassed, the 
population delocalises completely. In contrast, for negative epistasis, each ad- 
ditional step away from the fitness peak requires a larger mutation pressure 
than the previous step, and hence the population remains localised. 

Epistasis: Paramuse models 

Since Malthusian fitness is essentially the logarithm of Wrightian fitness, the 

absence of cpistatic interactions in continuous time models implies a linear de- 
pendence of the fitness w{a) on d{a, ao). To investigate the effects of epistasis, 
a quadratic fitness landscape of the form 



w{a) = a[l - 2d{a, ao)/N] + -b[l - 2d{a, ao)/Nf (39) 



Adaptation in simple and complex fitness landscapes 19 



has been considered [11], with o > and b> {b <0) for positive (negative) 
epistasis. This choice of parameters leads, through the mapping described in 
Sect. 2.2, to the quantum spin Hamiltonian 

JV ^ h 

if = M ^(<7f - 1) + a ^ + — ^ a| , (40) 

i=l i=l ij 

where af and denote the x- and z-componcnts of the quantum mechanical 
spin operator. As in the discrete time case, in the absence of epistasis (6 = 0) 
the spins at different sites i are independent. Epistasis introduces a coupling 
between any pair i, j of spins, independent of their position in the sequence. 
In the language of statistical mechanics, this is an interaction of mean field 
type; it is ferromagnetic for 6 < and antiferromagnetic for 6 > 0. 

An explicit solution of the model has been presented for the case a = 
0, 6 > [11]. In the limit N ^ oo, the mean overlap (25) is given by the 
expression 

m = max[l - /i/6,0], (41) 

which, in contrast to the case of the sharp peak landscape, vanishes continu- 
ously dX fi = b. In general, an error threshold exists only if — 6 < a < 0. This 
implies that the fitness displays a minimum at a distance < rfmin < N/2 
from the master sequence. 

3.5 Beyond the standard model 

In this section we discuss a few biologically motivated generalisations of the 
mutation-selection models described so far, while however maintaining the 
basic simplicity of the fitness landscape. 

Diploid models 

The evolution equations for diploid organisms are similar to those for the 
haploid case, except that the fitness W{a-) is replaced by the marginal fitness 

W{a,t)=Y,W{a,a')X{a\t), (42) 

a' 

where W{a,a') is the fitness of an individual with diploid genotype {a, a'), 
and X{a,t) is the fraction of individuals carrying sequence a in either one of 
their two sets of genes [23, 51]. The analog of the sharp peak landscape (20) 
is given by 

{Wo : a = a' = ao 
Wi : either a = ctq or a' = ao (43) 
W2 : both a, a' 7^ ao 

with Wq > Wi > W2- In the absence of dominance effects (Wi = \/WqW2 for 
Wrightian fitness or wi = {wo + W2)/2 for Malthusian fitness) the problem 
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can be reduced to the haploid case. However, in general, a transformation 
to a linear equation, as described in Sect. 2.2, is unknown for the diploid 
case; the equations are inherently nonlinear because of the dependence of 
the marginal fitness (42) on the population distribution. As a consequence, 
there arc multiple solutions for the fraction X(i7q) of wild type individuals. 
Nevertheless, error threshold phenomena occur whose locations depend on the 
relative values of Wo, Wi and W2. For instance, the critical mutation rate is 
roughly doubled as compared to the haploid model in the case of complete 
dominance of the wild type (Wq = Wi > VF2). 

Semiconservative replication 

While the quasispecies model described in Sect. 2.2 is appropriate for organ- 
isms with RNA as genetic material, it needs to be amended for DNA-based or- 
ganisms. The genotype corresponding to a double stranded DNA molecule can 
be represented by {a, a} where a is the complementary strand of a. The repli- 
cation process involves splitting the DNA and pairing each strand with the 
complementary bases to produce two daughter DNA's. Thus, only one strand 
of the original DNA is conserved in the daughter DNA. However, copying 
errors and subsequent (imperfect) repair result in a different DNA genotype 
{a', a'}. Thus, the (unnormalised) number of individuals of genotype {cr,a} 
evolves in time as [52] 

Z{{a,a},t) = -W{{a,a})Z{{a,a},t) 
+ ^ (p{a' ^ {cT,a})+p{-a' ^ {a,a}))W{{a',a'})Z{{a',a'},t) (44) 

{a'.a'} 

where p{a' {a, a}) is the probability that parent strand a' produces {cr,a} 
and the first term represents the loss of the original genome. For the sharp 
peak landscape, the error threshold occurs at 

which saturates for Wo 00 unlike (22), so that the loss of the master 
sequence can not be avoided by increasing its selective advantage. This can 
be traced back to the destruction of the parent genome in the semiconservative 
case, which implies that, at sufficiently high mutation probability per genome, 
increasing the reproduction rate of the master sequence actually accelerates 
its extinction. 

Dynamic landscapes 

The assumption of a static fitness landscape is good when evolution occurs 
on short time scales or in long-term, controlled experiments in the laboratory. 
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However, natural populations are usually subjected to dynamic environments 
such as that of pathogens living in a host with a dynamic immune system. For 
the problem of formation of quasispecies in the presence of a dynamic sharp 
peak landscape, two cases need to be distinguished - one when the fitness 
Wq of the master sequence (Tq is fixed but its location shifts at periodic time 
intervals of length t to a nearest neighbor [53], and the other in which the 
location is kept fixed but the height of the peak changes with time [54, 55]. 

In the former case, besides the usual upper limit on the mutation rate, 
an analytical approximation of the model shows the existence of a lower limit 
also [53]. The latter arises because when the peak shift occurs, at least one 
individual should be present at the new location so that it can replicate and 
form the quasispecies. For too low mutation rates, this may not happen and 
this effect is likely to be more pronounced for finite populations. 

In the case of a time-dependent peak height Wq (t) of the master sequence, 
the characteristic time scale t of variation of the fitness landscape must be 
compared to the response time of the population, which is the inverse of the 
relative growth rate of the master sequence compared to its mutants. When 
T is large compared to the response time the population fraction at the mas- 
ter sequence follows the landscape quasistatically. For rapidly changing land- 
scapes the time-averaged population undergoes an error threshold transition 
at the mutation strength /Zg given by [54, 55] 



which generalises (22) by replacing the static fitness by an average over a 
time interval of length T ^ r. For periodic Wo{t) with period r the fraction 
X{ao, t) also changes periodically with the same period but with a phase shift 
that increases with decreasing r [55]. Due to this time lag, the master sequence 
achieves maximum population when its fitness has already dropped from the 
maximum amplitude. 

Parental effects 

Digital organisms are computer programs with a set of instructions (genome) 
including copy commands due to which they can be replicated. During the 
copying process, some instructions can get deleted, repeated or replaced. An 
evolved program can perform complex logic operations by using a simple logic 
operator available to it. Such complex organisms are selected by allotting them 
more CPU time thus increasing their replication rate defined as the ratio of 
the number of logical instructions that they can execute to the number of 
instructions that they have to perform in order to produce a new program 
[18]. While the latter depends on the individual's own genome, the CPU time 
available to it is a parental influence. 




(46) 
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The situation is analogous to the case of biological organisms which obtain 

proteins etc. from the parent besides the genome. In sueh a case, the fraction 
X{a', a, t) of population at sequence a with ancestor a' evolves as [56] 

X{a', a,t) = J2 A{a")W{cT')p{a' ^ a)X{a", a', t) - f{t)X{a', a, t) (47) 

where f{t) = E<t",<t' ^(^")^(o'')^(o'"> o"'. and A{a') is the contribution 
to the fitness from the ancestor. In the absence of parental effects, A{a) = 1 
for all sequences and the original equation (12) is obtained for X{a,t) — 
J2a' ^{'^'t ^) t)- This can be generalised by weighting the population variable 
by the parental contribution and defining the normalised variable 

X{a, t) = Y, A{a')X{a', a, t)/ ^ A{a')X{a' , a", t) (48) 

(t' a' ^a" 

which reduces the variables in (47) to . Interestingly, in the steady 
state the population X((t, t) obeys the quasispecies equation (12) with fitness 
A((j)W((j). Thus the available results for the standard quasispecies model can 
be directly applied to this case. In particular, for the sharp peak landscape the 
fraction X[uq,uq) at the master sequence increases (relative to the null case 
when there are no parental effects) if the ancestral fitness A((To) > A{(j) for 
a ^ (To . It is also possible to obtain the opposite trend if the ancestral effect is 
deleterious and has to be compensated by the fitness of the individual itself, 
such as when A((jq) < A{a) and W{ao) > W{a). 

Heterogeneous mutations 

The accuracy of replication depends on enzymes called polymerases which can 
be present in different types with their respective accuracies. For example, as 
discussed by E. Lazaro elsewhere in this book, RNA virus strains that show 
resistance to certain mutagens may possess polymerases with a particularly 
high copying fidelity. In the presence of p polymerases with concentrations Cfe 
and replication error ^k, k = 1, ...,p, the mutation probability (3) generalises 
to 

p«^'-^)=i:«( (,_,;;;_^J ^'"''(i-.>)~. m 

One may expect that by increasing the concentration of the polymerase with 

low error rate, the error threshold can be increased (even to infinity). That this 
indeed is the case was demonstrated in [57] for p = 2 with concentration c of 
an error-free polymerase with replication error probability /Ui = and 1 — c of 
an error-prone polymerase with /i2 = fi > 0. For the sharp peak landscape, one 
can find the fraction X{ao) = Wop{ao cro)/(M^o — 1) of the master sequence 
by neglecting the back mutations as before where p((To — > ctq) ~ c+{l — c)e~^'^ 
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for ^ and N oo. Then the master sequence can localise the population 



which reduces to (22) for c = as expected and increases with increasing c. 
Since the argument of the logarithm should be positive for real it follows 
that c < c' = 1/Wo and on exceeding c', the master sequence continues to 
localise population for any mutation rate. 

4 Complex fitness landscapes 

We now turn our attention to "complex" landscapes which do not possess 
the symmetries of the simple ones discussed in the last section. Realistic 
landscapes are expected to have hills, valleys, basins and ridges [19]. A pic- 
torial representation of such a rugged fitness landscape drawn over a two- 
dimensional plane is shown in Fig. 4. Despite the intuitive appeal of such pic- 
tures, however, it should be kept in mind that they are metaphors rather than 
models of biological reality. Real fitness landscapes extend over the very high 
dimensional, discrete space of genotype sequences, and there are indications 
that the intuition gained in our experience with low- dimensional landscapes 
fails when applied to such abstract objects [2]. 



Fig. 4. Schematic representation of a rugged fitness landscape defined over a two- 
dimensional genotype space. 



Researchers trying to construct realistic fitness landscapes have followed 
one of two basic approaches. One approach is to study simple model systems 
for which the mapping from genotype to phenotype can be carried out ex- 
plicitly. This has been pursued in great detail for the case of RNA sequences, 
which will be briefly described in Sect. 4.1, as well as for proteins: for a de- 
tailed discussion we refer to the chapters by P. Schuster and P. Stadler, and 
by U. Bastolla, M. Porto, H. E. Roman and M. Vendruscolo in this book. The 
second approach, which was conceptually inspired by the statistical physics 
of disordered systems [58, 59], is to regard a given fitness landscape as the 



if 




(50) 
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realization of an ensemble of random functions with prescribed statistical 
properties. In this case an important quantity characterising the ruggedness 
of the landscape is the correlation coefficient p{d, N) between the fitnesses of 
two genotypes at Hamming distance d, which is defined as 

{ w{a)wia')) - {wia) y 

Here the angular brackets stand for an average over the ensemble of landscape 
configurations and the denominator ensures that p{0,N) is scaled to unity. 
We have defined (51) in terms of Malthusian fitness, but the Wrightian case 
can be treated in the same way. Examples of random fitness landscapes will 
be discussed in Sections 4.2, 4.3 and 4.4. 



M^) = ^r732V37;:Z;\^' d = d{a,a). (51) 



4.1 An explicit genotype-phenotype map for RNA sequences 

For the description of evolution experiments with self-replicating RNA mole- 
cules (sec Sect. 6.1), it is natural to assume that the fitness of a given RNA 
sequence depends only on the three-dimensional shape that the molecule folds 
into in the solution. As an approximation to the full three-dimensional shape 
(the ternary structure of the molecule), its secondary structure, defined as the 
set of allowed base pairings that satisfies the no-knot constraint and minimises 
the free energy, can be used. In contrast to ternary structure, the secondary 
structure can be computed from the sequence by efficient algorithms. Al- 
though this does not yet solve the problem of how to assign a fitness to the 
genotype, it allows to study in great detail the mapping from the genotype 
(the sequence) to the phenotype (the secondary structure) [20, 60, 61]. 

The most important feature of this mapping is that it is many-to-one. 
Indeed, the number of secondary structures of random RNA sequences of 
length N behaves asymptotically as [61] 

Mtwa ~ 1.4848 X x (1.8488)^, (52) 

whereas the number of sequences is 4^. Thus exponentially many sequences 
fold into the same secondary structure for large N. Since sequences with the 
same secondary structure must be assigned the same fitness, it follows that the 
fitness landscape contains large regions of constant fitness, which are there- 
fore selectively neutral. Typically there are a few common structures (which 
are represented by many sequences) and many more rare ones, with the dis- 
tribution of the number of sequences mapping to a given structure following 
a power law. The most common structures form neutral networks extend- 
ing throughout sequence space, such that any randomly chosen sequence is 
close to a sequence on this network. Similar networks have also been found 
in the sequence space of proteins [62, 63, 64], see the chapter by U. BastoUa, 
M. Porto, H.E. Roman and M. Vendruscolo in this book. Some aspects of 
the evolutionary process on such neutral networks will be discussed below in 
Sections 4.4 and 5.1. 
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4.2 Uncorrelated random landscapes 

The simplest kind of random fitness landscape is the uncorrelated landscape 
where the fitnesses are independent random variables drawn from some com- 
mon probability distribution [59]. In this case the correlation function (51) 
reduces to p{d,N) = 6d,o- An example from this class is the Random En- 
ergy Model (REM) of spin glass theory [65, 66, 67], for which the (Wrightian) 
fitness is given by 

Wia) =c^p[K,E{a)l (53) 

where the "energies" E are independent Gaussian random variables with dis- 
tribution 

P{E) = ^eM-EVN), (54) 

and K is an "inverse selective temperature" . 

This model displays a phase transition which is quite similar to the error 
threshold in the single peak landscape. At high mutation rates the population 
is delocalised while at low mutation rates it is frozen into the master sequence, 
which in this case is simply the sequence (Tmax with the largest value -Emax 
of E(a) in the particular realization (the "ground state" configuration of the 
REM). The scaling with A'' in (54) is chosen such that this maximal value 
is proportional to A^, i?max = N^ln2 to leading order. At the transition the 
mean overlap (25) jumps discontinuously from one to zero [41]. 

The critical mutation probability required for delocalisation can be com- 
puted along the lines used in Sect. 3.2 for the sharp peak landscape. Neglecting 
back mutations to Cmax, a nonzero population fraction X((7„iax) is maintained 
if the product of Wmax = exp[K;£^max] with the probability (1 — n)^ of pro- 
ducing an error-free offspring is greater than the mean population fitness W 
in the delocalised phase [12]. The latter is obtained by averaging (53) with 
respect to the distribution (54), which yields W = exp[K^A''/4]. Comparing 
the two expressions, one finds [12, 41, 67] 

/ic = 1 - exp[K^/4 - KV\n2]. (55) 

The critical mutation probability reaches its maximal value /Xc = 1 /2 at the 
value Kc = 2\/ln 2 of the inverse selective temperature, which coincides with 
the glass transition of the REM [65]. For k > Kc the selective advantage of 
the most fit sequence is so great that it dominates the population even in the 
limiting case fj. = 1/2, when a complete reshuffling of genotypes occurs in each 
generation. 

We note that, in contrast to most examples discussed in Sect. 3, the ex- 
pression (55) is independent of the sequence length A^. This is a consequence 
of the scaling of the random energies in (54) . Indeed, this scaling implies that 
the ratio VFniax/Wmin = exp[2i5niax] grows exponentially in A'', and hence the 
right hand side of (38) is independent of N. 
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4.3 Correlated landscapes 

An example of a random fitness landscape with correlations can be constructed 
from the Sherrington-Kirkpatrick (SK) spin glass model, which is defined by 
the energy function 

£^sk(o-) = ;^ X] -^ij^i^j- (56) 

Here <7j = ±1 and the Jij are independent Gaussian random variables with 
zero mean and unit variance. A similar energy function arises for the graph 
bipartitioning problem (GBP) discussed in Sect. 2.1, 

Egbp{(t) = - X JijCTiaj, (57) 

i<j 

where the spins satisfy the vanishing total spin constraint. In this case Jij = 
J > if the sites i and j are connected by an edge of the graph, and Jij = 
else [22, 42]. Through (53) energy functions (56) and (57) can be directly 
interpreted as Malthusian fitness landscapes [58, 66, 42]. They belong to a 
large class of random landscapes for which the correlation function behaves 
as [60] 

p(rf,Ar)«l-ai|, + o(^(^^)'j (58) 

for N,d ^ 00 but d/N <C 1, with a constant oi which is independent of iV. The 
significance of this behavior becomes clear if we interpret d/N as a continuous 
variable: For random functions of a real variable, the linear dependence of 
the correlation function for small arguments is typical of a non-differentiable 
process with independent increments (such as Brownian motion), whereas for 
a differentiable random process the correlation function varies quadratically 
at small distances. In this sense the linear behavior in (58) is indicative of the 
ruggedness of the landscape. 

A simple modification of the argument leading to (55) gives some insight 
into how the fitness correlations affect the location of the error threshold [42] . 
We assume that in the localised phase the bulk of the population is located at 
some distance d* = 0{1) from the most fit genotype fJmax, with corresponding 
energy values E « /o(d*)-Emax- Equating the resulting mean population fitness 
W = exp[«;£^] to the product (1 — /i)^Wi„ax and using (58) then yields, for 
large N, the estimate 

^^^2 • (59) 

Together with the scaling of the ground state energy as ~ N^^^ and 

^rnSf ^ N^^^ for the SK-model and the GBP, it follows that ~ ^""3/2 
and /x^^^ ~ N"^/"^, respectively, in agreement with simulations [42]. 
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Fig. 5. Correlation function (61) for the A'^Jf -model with N = 20. 



A family of random landsc;apc's in which the ruggcdncss can be tuned are 
the NK landscapes^ introduced by Kauffman and Levin [5, 68]. In this model, 
the Malthusian fitness^° of a genotype is written as a sum of contributions 
from the N loci, 



where each Wi is a function of ctj and K other loci chosen at random^ ^. The 
number of possible states of CTj and its K chosen neighbors is then (.^^^ , and 
each of these states is assigned a random fitness drawn from some continuous 
probability distribution. For large N the additive form of (60) ensures that 
the w{(j) become Gaussian by virtue of the central limit theorem. 

For K = Q the loci are independent, and the model becomes equivalent to 
the multiplicative fitness landscape without epistasis discussed in Sect. 3.4; 
in particular, there is a unique fitness peak. At the other extreme K = N — 
1, the Wi are independent random variables and the model reduces to the 
uncorrelated landscape of Sect. 4.2. With increasing K the number of fitness 
maxima increases and their height decreases [68] , and the correlation function 
is given by^^ [71] 

^ A related family was defined in [66] in analogy to Derrida's p-spin model of spin 

glasses [65]. 

A Wrightian version of the model is discussed in [69]. 

Other schemes for choosing the interacting loci are described in [17, 60, 69]. 
The expressions for the correlation function given in [60, 70] are incorrect, because 
it is not taken into account that the d mutations separating the two genotypes in 
(51) must afi^ect different sites in the sequence. 




(60) 



i=l 
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p(d,N) = \ m(N-K-d-iv. •'^^^^ ^ (61) 
I : else. 

This shows how the correlations decay more rapidly with increasing epistasis 
(increasing K), and reduces to p{d, N) = 1 — d/N for K = and p{d, N) = dd,o 
iov K = N — 1, respectively (see Fig. 5). 

Another model with tunable correlations was introduced in a study of 
evolutionary dynamics in the limit of infinite genome size but with a finite 
population [72]. For N oo every mutation creates a genotype that has 
not been previously represented in the population. The fitnesses can then be 
created "on the fly" according to the transition probability 

Fvoh[w{(T)\w{a')] ~ exp[-(w(o-) - X'^w{a')f] (62) 

where d = d{a, a' ) and the parameter < A < f determines the decay of the 
correlations as p{d,N) ~ A"*. 



4.4 Neutrality 

Wc have seen above in Sect. 4.1 that realistic fitness landscapes obtained from 
mapping sequences to structures contain extended regions that are selectively 
neutral. It has been argued that this is a general feature of high-dimensional 
fitness landscapes, which has important consequences for the way in which 
evolutionary dynamics should be visualized [2] . Rather than consisting of val- 
leys and hilltops, as suggested by the low-dimensional rendition in Fig. 4, such 
a holey landscape would display a network of ridges of approximately constant 
fitness, along which a population can travel large genetic distances without 
ever having to cross an unfavorable low- fitness region^^. 

Several properties of the stationary population distribution for the qua- 
sispecies model on a neutral network can be inferred without specifying the 
precise structure of the network [73] . It is only assumed that the viable geno- 
types make up a connected graph Q of constant fitness, which is surrounded 
by genotypes that are lethal or at least of very low fitness. Mutations are 
restricted to nearest neighbor sequences. Then the key observation is that 
the stationary population distribution X{a) on the network^^ is the principal 

The evolutionary importance of paths of viable genotypes that connect distant 
points in sequence space was emphasized by Maynard Smith [3]. He illustrates 
the issue with a game where the goal is to transform one word into another by 
changing one letter at a time, with the requirement that all intermediate words 
are meaningful (i.e., "viable"). An example is the path WORD WORE — » 
GORE ^ GONE ^ GENE. 
^'^ The population on the network is normalized to unity, X/o-eg -^i'^) ~ 1' which 
does not include the individuals in the lethal region. Although these individuals 
do not reproduce, they constitute a finite fraction of the population which is 
replenished by mutations from viable genotypes. 
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eigenvector of the adjacency matrix of the graph, which is a matrix that has 
unit entries for pairs of viable sequences that are connected by a single point 
mutation, and zero entries otherwise. The corresponding eigenvalue A is equal 
to the population neutrality (v), 

= (63) 

where z/(cr) is the number of viable neighbors of sequence a (the degree of the 
corresponding node of Q). The weighting by the population fraction X{a) in 
(63) is significant: For any graph Q the principal eigenvalue of the adjacency 
matrix satisfies the bounds [75] 

l'<A< fmax, (64) 

where P and Pmax denote the average and maximal degrees of the graph. For 
a random graph with a range of degrees the relations (63) and (64) imply 
that generally (v) > D, which shows that the population preferentially resides 
at nodes where the number of viable neighbors is larger than on average. 
This has been referred to as the evolution of mutational robustness [73] . The 
heterogeneity of the node degree along the neutral network has important 
consequences also for the civolutionary dynamics, because it induces strong 
fluctuations in the rate of neutral substitutions [63, 64, 74]. 

Neutral networks can be modeled as random subgraphs in sequence space. 
Such subgraphs are defined through a simple modification of the uncorrelated 
landscape model of Sect. 4.2, where each sequence a is randomly assigned 
fitness W{a) = 1 (viable) with probability P and W{a) = (lethal) with 
probability 1 — P. Each connected region of viable genotypes then constitutes 
a random subgraph. For small P these regions are small and isolated, but at 
the percolation threshold P = Pc given by 

= (7^ 

a giant network appears which spans the sequence space and which, for P > 
Pc, contains a finite fraction of all sequences [2, 76]. Since is a large number, 
the fraction of viable genotypes needed to create such a spanning network is 
remarkably small [3] . 

For subgraphs of the binary hypercube (i = 2) with random assignment 
of links (rather than sites) it has been shown that the principal eigenvalue of 
the adjacency matrix is asymptotically given by [75] 

yl«max[7VP,7i^. (66) 

Taking ^ oo at fixed P one finds that ;/max ^ N, so that A — > NP = v. In 
this limit the neutral network behaves like a regular graph, and no significant 
mutational robustness develops. On the other hand, if P — > as A'' — > oo with 
NP fixed, one obtains fmax ~ N/lnN ^ v, and the mutational robustness 
effect is significant. 
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5 Dynamics of adaptation 

In this section we turn our attention to time-dependent aspects of the adaptive 
process. In rugged fitness landscapes the population is faced with the task 
of reaching ever higher fitness peaks by traversing fitness valleys or neutral 
networks, which typically gives rise to a pattern of episodic or punctuated 
evolution. This phenomenon will be discussed in general terms in the following 
subsection, and a specific model study [77] will be summarized in Sect. 5.2. In 
the final subsection we describe an approach to evolutionary dynamics that 
is suited for landscapes that are smooth, in the sense that a simple (linear) 
relation between fitness and genetic distance can be assumed. 

5.1 Peak shifts and punctuated evolution 

The existence of multiple fitness peaks of different height, as illustrated in 
Fig. 4, immediately suggests that evolutionary histories should generally dis- 
play two distinct regimes: Periods of stabilizing selection, where the popu- 
lation resides near a local fitness maximum, and peak shifts in which the 
population moves quickly from one fitness peak to another of greater height. 
The stationary distributions in a single peak landscape that were discussed 
at length in Sect. 3 can be viewed as an approximate description of the first 
regime. The necessity of peak shifts for explaining the succession of biologi- 
cal forms in the paleontological data has been recognized for a long time^^, 
but the underlying mechanisms (and even the relevance of the concept itself) 
remain controversial. 

Mathematical analysis of peak shifts drivc;n by stochastic fiuctuations in 
finite populations (genetic drift) generally show that the waiting time for 
the shift is vastly larger than the time required for the transition itself [79, 
80]. This can be argued to support the scenario of punctuated equilibrium in 
macroevolution [81], which states that evolutionary changes (including both 
speciation and phenotypic changes within a lineage) occur during relatively 
short time intervals which are separated by long periods of no discernible 
change {stasis). 

However, for realistic population sizes the stochastically driven peak shifts 
may be far too rare to be relevant, and in fact they may not be needed at 
all, if the picture of a holey landscape spanned by a network of neutral ridges 
described in Sect. 4.4 is generally applicable [2]. Evolution in such a landscape 
will nevertheless be punctuated, because a population moving by genetic drift 
across a neutral network can increase its fitness only by finding a path to 
another network of higher fitness. If these paths are rare, a natural separation 
of time scales between (phenotypic) stasis and sudden fitness jumps arises. 
This scenario is well established for simulations of in vitro evolution of RNA 



A famous example is the transition from browsing to grazing behavior in equids 
[78]. 
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sequences [20, 82]. Borrowing a concept from statistical physics, it can be said 
that in this case the population is confined by entropic barriers rather than 
by fitness barriers [83]. 

5.2 Evolutionary trajectories for the quasispecies model 

In the deterministic mutation-selection models of interest in this chapter, 
stochastic fluctuations cannot be invoked to drive peak shifts. Nevertheless 
a population initially placed near one fitness peak in a multi-peaked land- 
scape is able to relocate to a higher peak, by developing tails of mutants 
which (since the number of individuals is formally infinite) with time explore 
the entire sequence space. Once a small mutant population has been estab- 
lished at the distant fitness peak, it starts to compete with the majority at 
the original peak and, if the newly populated peak is higher, it will eventually 
come to dominate the population. In this way the majority of the population 
can shift between peaks without ever actually having to traverse a fitness 
valley (Fig. 6). 




100 200 300 400 
sequence number 




100 200 300 400 
sequence number 



Fig. 6. Example of a peak shift event for quasispecies dynamics with binary se- 
quences of length A?" = 9 in an uncorrelated random fitness landscape. At time t = 8 
the most populated sequence is near the origin (sequence number 1), but at time 
t = 14 it has moved to a sequence number close to 300. The peaks of lower height 

represent the first- and second neighbor mutants of the most populated sequence. 
They are not adjacent because of the linear arrangement of the sequences. 



The time tx required for a single peak shift in the discrete time quasis- 
pecies model has been estimated numerically for a simple degenerate two-peak 
landscape, given by (32) with Wn = Wq and Wn-i = 1 [84]. The population 
was first allowed to equilibrate in a single peak landscape and then the second 
peak was turned on. The result is 
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which, somewhat surprisingly, has the same form as the time required for a 
finite population to cross a fitness valley [83] ; of course in the latter case there 
is an additional dependence on the population size. 

The evolutionary trajectories that result from multiple peak shifts in an 
uncorrclatcd rugged fitness landscape have been studied in detail in a strong 
selection limit motivated by the zero temperature limit of the statistical 
physics of disordered systems [77, 85, 86, 87]. Writing 

Z{a,t) = e'^-^f'^'*), W{a) = 6"^^"^ n = e"-^, (68) 

with K denoting the inverse selective temperature (see Sect. 4.2), and starting 
with an initial condition Z{a,0) — 6^.^(0) where cr'^"^ is a randomly chosen 
sequence, the dynamics takes the following form in the n ^ oo limit: 

F{a,t + 1) ^mSiX„,[F{a',t)+E{a')-d{a,a')], t>2 (69) 
F{a, 1) = S((7(o)) - d{a, a(°)). (70) 

Here the logarithmic fitnesses E(a) are independent random variables chosen 
from a common distribution p{E). As already discussed in Sect. 4.2, one ex- 
pects the whole population to be localised at the fittest genotype in the large 
time limit. At any finite time, in the strong selection limit, the population can 
be identified with the most populated genotype. The behavior of this geno- 
type is essentially unaffected by dropping the mutation term for times t > 1, 
so that the dynamics reduces to [86] 

F{a,t)=F{a,l) + {t-l)E{a), t>2. (71) 

This illustrates the fact that, after the entire sequence space has been "seeded" 
by mutants of the original genotype cr^"^ at time t = 1, the subsequent evo- 
lution consists in the competition of independent populations located at the 
fitness peaks. Distant peaks of high fitness are disadvantaged by a small initial 
population but may come to dominate at later times. 

Since the seeding population F{a, 1) of a sequence only depends on its 
distance from the initial genotype cr'"-* , within each shell of constant k = 
d{a, (t(°)) only the most fit genotype is a contender for global leadership. Thus 
the dynamics of the £^ variables (71) can be reduced to A^ + 1 shell population 
variables F{k,t) whose fitnesses E{k) are chosen from the distribution 



Pk{E) = ak p{E) / p{x)dx . (72) 




This is the distribution of the maximum among independent random vari- 
ables with distribution p{E), and ak is the number of sequences in shell k, as 
defined in (4). 

The representation of the "evolutionary race" as a problem of crossing 
straight lines^^ is illustrated in Fig. 7. At a given time t, the most populated 

The problem is related to models of highway traffic, where each vehicle is equipped 
with a fixed random speed and overtaking is forbidden [88]. 
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sequence located in shell k* leads until it is overtaken by a shell k* with 
E{k* ) > E{k*) and so on, until the global fitness maximum takes over. A 
natural question of interest is to identify the sequences that take part in this 
evolutionary trajectory, and to determine their number. It is clear that for 
a sequence to participate in the trajectory it is necessary that it constitutes 
a fitness record, in the sense that its fitness exceeds the fitnesses of all se- 
quences that are closer to a''^\ An analytical treatment of the statistics of 
these independent but non-identically distributed records shows that the av- 
erage number of records encountered on the way to the global maximum is 
[77] 

(73) 

for large N, and that essentially all records are located within the distance 

rfmax = N{£ — l)/£ near which most of the sequences (including the most fit 
sequence) reside. For £ = 2, the inter-record spacing between the j-th and 
j + 1-th record is of the order \/N jj where i = 1 labels the last record (the 
global maximum). Thus, a few records separated by distances of order \fN 
occur near dmax and the rest are clustered away from it. 

However, many records are bypassed by fitter sequences that arise further 
away from (t^"^ but manage to catch up with the c;iirrent leader at an earlier 
time. For unbounded fitness distributions with Gaussian or exponential tails, 
the number of non-bypassed records (which is the number of sequences that 
take part in a trajectory) is found to be only of order \/]V with a luiiform 
spacing ~ v^, which suggests that the competition among the contenders is 
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strong when the average fitness of the population is still low. For fat-tailed 

power law distributions the average number of records that are not bypassed 
is asymptotieally equal to unity, which implies that the population relocates 
to the global fitness maximum in a single step. 



B 
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Fig. 8. Timing of evolutionary jumps. 



Several statistical properties of the timing of peak shifts turn out to be 
independent of the fitness distribution [77, 85, 86]. Specifically, denoting 
by Tj the time at which the jth peak shift occurs, with j = I denoting the 
last shift (which reaches the global fitness maximum), j = 2 the penultimate 
peak shift and so on (Fig. 8), the corresponding distributions display universal 
power law tails 

p,(r,)~(r,)-«+i). (74) 

In particular, the expected value of Ti is infinite. The prefactors of these power 

laws depend however on the fitness distribution and the sequence length, 
in such a way that e.g. the typical value of Ti tends to unity for fitness 
distributions with a power law tail. 

5.3 Dynamics in smooth fitness landscapes 

So far we have discussed landscapes in which the fitnesses can be very 
different from each other and as described above, the evolutionary trajec- 
tory can change in a stepwise manner if the landscape has local maxima. 

Smoothly varying landscapes for which the system does not get trapped in 
such metastable states are the subject of the following discussion. Smooth- 
ness will be taken to imply here that there is a simple (linear) relationship 
between the fitness of a genotype and its genetic distance from the master 
sequence. Individuals can then be characterized by their fitness alone, and 
the description can be based on a one-dimensional fitness space [89] . 

The prototypical case in which this reasoning applies is that of the multi- 
plicative fitness landscape discussed in Sect. 3.4. We work in the Malthusian 
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setting and assume that the fitness w{a) is simply equal to the number of mis- 
matches with respect to the master sequence, w ~ 0, N. Then the fraction 
Y{'w,t) of individuals with fitness w at time t evolves as [89] 

Y{w, t) = {w- w)Y{w, t)+ 

+fl[{w + l)Y{w + l,t) + {N -w + l)Y{w -l,t)- NY{w, t)], (75) 

which is just the paramusc equation (7) evaluated for the present fitness land- 
scape, with w denoting the mean fitness of the population. For large the 
fitness w can be treated as a continuous variable. Setting r = {w — N/2)/\/N, 
p, = fl/VN and r = VNt, Eq. (75) reduces for A'' ^ oo to the drift- diffusion 
equation 

dY u d'^Y d 

^ = (-^-)>^ + f^ + ^(2/--n (76) 

Analysis of (75, 76) and related equations [90] shows that the mean fitness 
diverges in finite time, since the equations ignore the fact that at least one 
individual is required to initiate the reproduction process. This can be cir- 
cumvented by imposing a cutoff Yc inversely proportional to the population 
size, below which the selection term does not operate. 

With this modification, one finds that at short times, the population which 
was initially spread over a fitness range gets localised about the maximum 
available fitness leading to a fast growth of average fitness. This is followed 
by the collective motion of the localised "species" as a traveling wave with 
constant speed and width (as long as the population is far from the bound- 
aries w = and N of the fitness space). A finite population size analysis of 
discrete models (described in the infinite population limit by the above con- 
tinuum equations) shows that both speed and variance of the wave diverge 
linearly with increasing population size, which is consistent with the finite 
time singularity that appears in the absence of a cutoff [90] . 

Quantitative agreement with finite population simulations requires a more 
careful treatment in which the most fit non-empty mutant class is treated 
stochastically, while keeping deterministic differential-difference equations of 
the type (75) for the remainder of the population. In addition, the continuum 
limit of (75) should be carried out on the level of In Y rather than for Y itself, 
which leads to a nonlinear drift-diffusion equation replacing (76) [27]. Recent 
applications of fitness space models that go beyond the present discussion 
include studies of the in vitro evolution of DNA sequences selected for protein 
binding [46], viral populations undergoing serial population transfers [91], and 
the effects of recombination in asexual populations [92]. 

6 Evolution in the laboratory 

Viruses and bacteria are suitable c:an(li(lat{« for testing the theory of asexual 
evolution due to their simple genomes and high replication rates. For instance. 
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RNA viruses which are characterised by high mutation rates and small genome 
(see Table 2.1 and the chapter by E. Lazaro in this book) can produce about 
10^ copies an hour. Their typical population numbers are of the order of 10^^, 
thus getting close to the infinite population condition for the applicability 
of quasispccics theory. Interestingly, evolution can also occur in non-living 
systems such as RNA extracted from a bacteriophage which we now proceed 
to discuss in the following subsection. 

6.1 RNA evolution in vitro 

Early in vitro studies of adaptation to a given environment were carried out on 
a simple system comprising of RNA molecules and the enzyme RNA replicase 
which is required to catalyse the RNA replication reaction. In the first of a 
series of experiments, the time interval during which the reaction is allowed to 
proceed was gradually reduced with the number of generations, thus selecting 
the rapidly growing molecules [93] . By the 74th generation, the initial baseline 
strain with a genome length of a few thousand bases evolved to a 15 times 
faster replicating (but no longer pathogenic) chain of merely a few hundred 
bases, by casting off the parts of the genome which do not participate in the 
in vitro replic;ation process. Subsequently, experiments using such short RNA 
were performed under different conditions and selection pressure [94, 95]. In 
particular, the formation of a quasispecies consisting only to 40% of the master 
sequence and many mutants has been demonstrated [96]. 

6.2 Quasispecies formation in RNA viruses 

Inside a cell, a virus is subjected to the constantly changing environment of the 
host, whereas the quasispecies concept described in earlier sections assumes an 
infinite population evolving towards a stationary state in a static landscape. 
Nevertheless, evidence for quasispecies formation has been obtained in in vivo 
experiments on RNA viruses by examining their genetic heterogeneity [97], 
and the quasispecies concept now plays an important role in virology [98, 99, 
100]; for a detailed discussion we refer to the chapter by Ester Lazaro in this 
book. 

The first such experiment was performed on a Q/3 phage population derived 
from the wild type [25]. On sampling about 10% of its sequence, it was found 
that on average, the genome of the derived phage differs from the wild type 
at about two positions. Assuming a Poisson model for the distribution of 
deviations from the wild type, only 14% of the population was found to be 
wild type and the rest was accounted for by related mutants with up to 3-4 
substitutions. Similarly, in the Hepatitis C virus, half of the RNA molecules 
were found to be identical and the rest one to four mutations away from 
each other [101]. In the case of HIV, the quasispecies concept has been used 
to explain the reappearance of the virus after the; treatment with drugs that 
target only the wild type [102]. Many experiments, such as [103] on poliovirus. 
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also show that RNA viruses operate close to the error threshold, since on a 
modest increase in mutation rate (through chemicals), the virus population 
was found to lose its genetic structure. 

6.3 Dynamics of microbial evolution 

The dynamics of adaptation have been studied in several long-term experi- 
ments on asexually reproducing microbes like viruses and bacteria. In exper- 
iments on E. coli [104, 105], several populations are derived from the same 
ancestor and allowed to replicate under identical conditions. The ancestor is 
engineered to have a selectively neutral marker so that it can be distinguished 
from the offspring colony. The process of evolution occurs because the progeny 
is grown in the presence of limited supply of glucose, unlike the ancestor. 

To measure the fitness of the evolved type, the ancestor and the evolved 
progeny are made to compete for glucose by mixing them in equal amoimts 
at time t = and estimating their respective densities pA and pp at t = 
and t = 1 where time is measured in days. Then the Malthusian fitness of the 
evolved type at any instant measured relative to the ancestor A is given by 

ln(pp(l)/pp(0)) 

Hpa{1)/pa{0}) 

The experiments indicate that the fitness of all populations improves in time, 
but each of the replicate populations reaches a different fitness level at large 
times. This supports the picture of a rugged fitness landscape (Fig. 4) with 
several peaks in which the population, starting from the same initial point, 
reaches different local maxima via different evolutionary trajectories. 

Initially fitness changes rapidly but slows down considerably in the course 
of time. When the same experimental data is viewed at a finer scale, the best 
fit to the data is obtained if the fitness increases are assumed to occur in steps. 
The occurrence of punctuated evolution is associated with the selection of rare 
beneficial mutations [106]. Although a large number of advantageous muta- 
tions with small effects may have occurred, a few mutations with large effects 
quickly spread through the population and are responsible for the jumps in 
the fitness. For a review of other experiments with this bacterial population 
see [9, 107]. 

The step-like nature of fitness trajectories, especially the properties of the 
first step, has been investigated in detail in other experiments as well. For 
instance, in [108], the distribution of the fitness conferred in the first step was 
measured in E. coli, which supports the above observation of the occurrence 
of few mutations with large benefits and many with small payoffs. Similar 
experiments have also been performed on the RNA virus (j)^ [109]. This study 
tracked the fitness recovery in a population, after a deleterious mutation has 
been induced by a population bottleneck, for about hundred generations. The 
fitness was seen to recover in steps but the number of steps (and the fitness 



(77) 
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benefit) was found to depend strongly on the population size. While large 
populations rcc;ovc'rc'd in one large step, smaller populations required many 
steps each granting small favors. As discussed in detail in the chapter by Ester 
Lazaro, such population bottlenecks occur naturally in the life cycle of viruses, 
because the number of viral particles that are transmitted from one host to 
another is often very small. 

Finally, we note that under certain conditions populations of RNA viruses 
display a linear increase or decrease of fitness with time [27, 110], which can 
be analyzed within the framework of the fitness space models discussed in 
Sect. 5.3. 



7 Conclusions 

In this chapter we have given an overview over a class of models of adaptive 
evolution which include selection and mutation, but (due to their deterministic 
character) ignore effects of genetic drift in finite populations. A large body 
of work spread out over different scientific communities has been devoted to 
such models, and our survey must necessarily remain quite incomplete. We 
have therefore tried to focus on some general concepts such as sequence 
space, fitness landscapes, error thresholds and epistatic interactions - that we 
believe to be useful also beyond the specific biological situations in which the 
models apply. 

Stochastic effects characteristic of finite populations are expected to be 
quantitatively and even qualitatively important for several of the phenomena 
we have described. Genetic drift induces a new mechanism of genetic degra- 
dation, Muller's ratchet [111], in which the fittest genotype is lost from the 
population because it is not sampled for reproduction. In the limit of infinite 
sequence length this process is irreversible, and it generally contributes to the 
delocalisation of the population from fitness peaks. Correspondingly, a com- 
mon result of finite population studies in simple [23, 37, 48] as well as complex 
[42, 72] landscapes is a lowering of the error threshold mutation rate with de- 
creasing population size. A comparison between Muller's ratchet and the error 
threshold in infinite population models can be found in [10, 50]. As described 
in the chapter by E. Lazaro, both mechanisms for genetic degradation are 
being considered as possible strategies for fighting viral infections. 

The finite size of the population is also crucially important for the peak 
shifts in rugged landscapes discussed in Sect. 5.2, because it imposes a cutoff 
on the tails of rare mutants which are responsible for the communication be- 
tween distant fitness peaks. Much of the analytic work on adaptive dynamics 
that takes stochastic aspects into account has considered the regime of low 
mutation rates^'', where the population consists of a single genotype at most 

^"^ The quantitative characterization of this regime is that the product of the pop- 
ulation size and the mutation probability per site is small compared to unity 
[26]. 
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times and the generation and fixation of new mutations are rare events. In 
these studies the geometrical constraints on the availability of new mutants in 
sequence space are usually ignored, and the timing and fitness effects of mu- 
tations are instead generated by a suitable stochastic process [112, 113]. An 
important task for the future will be to integrate the different thcoretic;al ap- 
proaches, with the ultimate goal of bringing them to bear on the experimental 
data that are becoming available. 
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